Assignment 1 (individual)

Network Analytics

Konstantinos Paganopoulos

This is based on the data file called HW1_asset_prices.csv. This represents the price movements of a set of assets (bonds, stocks etc., their description is quite irrelevant here). Economists and investors are very interested in the correlation of asset prices, both to understand risk, as well as (hopefully) find correlation to lagged asset prices for investing. A correlation matrix with N assets is an N × N matrix of correlations. See https://en.wikipedia.org/wiki/Stock_correlation_network for some background information.

Your task is to visualize the correlation matrix in network form. You are free to use any python package for your calculations in a (numpy, scipy, pandas etc.) as you see fit. For b and c, I expect you to use the matplotlib.pyplot interface in NetworkX. You can gain the +5 point if you (i) submit in Jupyter notebook format (ii) interface to any external drawing packages such as graphviz to call one of their drawing functions.

Get a skeleton code working to do the tasks (can be done in under 10 lines of code with the right built-in functions in pandas and networkx) and then enhance and explore from there on.

a. (10 points) Calculate the correlation matrix (you may have to use your Stats and Econometrics knowledge here)

b. (10 points) With assets as nodes, plot the matrix as a network. Experiment with the different graph layouts and choose the one that you believe is best, and explain why

c. (10 points) Enhance your plot by representing the thickness of the edges and size of the nodes to give some insight.

Answers:

a.

We first import the necessary libraries. We also use the command "%matpotlib incline" in order to include our graphs in jupyter notebook.

In [1]:
# we first import the necessary libraries
%matplotlib inline
import numpy as np
import pandas as pd
import networkx as nx
from graphviz import Graph
import matplotlib.pyplot as plt

We load our data set.

In [2]:
# we load the csv file
data = pd.read_csv("HW1_asset_prices.csv", sep = ',')
In [3]:
# we see the first 5 rows of our data set
data.head()
Out[3]:
Date EOD~BND.11 EOD~DBC.11 EOD~DIA.11 EOD~EEM.11 EOD~EFA.11 EOD~EMB.11 EOD~EPP.11 EOD~EWG.11 EOD~EWI.11 ... EOD~VGK.11 EOD~VPL.11 EOD~VXX.11 EOD~XLB.11 EOD~XLE.11 EOD~XLF.11 EOD~XLK.11 EOD~XLU.11 EOD~CSJ.11 EOD~FXF.11
0 2017-11-08 81.83 16.40 235.46 46.78 69.87 114.60 47.69 33.18 30.95 ... 58.20 72.77 33.53 58.70 69.82 26.25 64.01 55.70 104.96 94.5100
1 2017-11-07 81.89 16.43 235.42 46.56 69.64 114.65 47.22 33.07 31.09 ... 58.17 72.20 33.52 58.64 70.16 26.38 63.66 55.66 105.01 94.5400
2 2017-11-06 81.86 16.53 235.41 46.86 69.90 115.26 47.20 33.34 31.22 ... 58.67 71.98 33.34 58.58 70.25 26.75 63.63 55.00 105.00 94.7500
3 2017-11-03 81.80 16.22 235.18 46.34 69.80 115.42 47.09 33.39 31.22 ... 58.58 71.88 33.66 58.83 68.68 26.78 63.49 55.21 105.00 94.4400
4 2017-11-02 81.73 16.12 234.96 46.58 69.91 116.15 47.31 33.50 31.43 ... 58.69 71.89 33.71 58.86 68.48 26.89 62.99 55.01 105.04 94.6299

5 rows × 40 columns

We drop the first column:

In [4]:
# we drop the first column
df = data.drop('Date', axis = 1)
df.head()
Out[4]:
EOD~BND.11 EOD~DBC.11 EOD~DIA.11 EOD~EEM.11 EOD~EFA.11 EOD~EMB.11 EOD~EPP.11 EOD~EWG.11 EOD~EWI.11 EOD~EWJ.11 ... EOD~VGK.11 EOD~VPL.11 EOD~VXX.11 EOD~XLB.11 EOD~XLE.11 EOD~XLF.11 EOD~XLK.11 EOD~XLU.11 EOD~CSJ.11 EOD~FXF.11
0 81.83 16.40 235.46 46.78 69.87 114.60 47.69 33.18 30.95 60.02 ... 58.20 72.77 33.53 58.70 69.82 26.25 64.01 55.70 104.96 94.5100
1 81.89 16.43 235.42 46.56 69.64 114.65 47.22 33.07 31.09 59.65 ... 58.17 72.20 33.52 58.64 70.16 26.38 63.66 55.66 105.01 94.5400
2 81.86 16.53 235.41 46.86 69.90 115.26 47.20 33.34 31.22 59.18 ... 58.67 71.98 33.34 58.58 70.25 26.75 63.63 55.00 105.00 94.7500
3 81.80 16.22 235.18 46.34 69.80 115.42 47.09 33.39 31.22 59.19 ... 58.58 71.88 33.66 58.83 68.68 26.78 63.49 55.21 105.00 94.4400
4 81.73 16.12 234.96 46.58 69.91 116.15 47.31 33.50 31.43 59.05 ... 58.69 71.89 33.71 58.86 68.48 26.89 62.99 55.01 105.04 94.6299

5 rows × 39 columns

Now we calculate the correlation matrix as follows:

In [5]:
# we calculate the correlation matrix
corr_matrix = df.corr()
corr_matrix_initial = corr_matrix # store correlation matrix for future questions
corr_matrix_initial
Out[5]:
EOD~BND.11 EOD~DBC.11 EOD~DIA.11 EOD~EEM.11 EOD~EFA.11 EOD~EMB.11 EOD~EPP.11 EOD~EWG.11 EOD~EWI.11 EOD~EWJ.11 ... EOD~VGK.11 EOD~VPL.11 EOD~VXX.11 EOD~XLB.11 EOD~XLE.11 EOD~XLF.11 EOD~XLK.11 EOD~XLU.11 EOD~CSJ.11 EOD~FXF.11
EOD~BND.11 1.000000 -0.822062 0.794479 0.069905 0.114368 0.912854 0.119582 0.099310 -0.482615 0.638039 ... -0.037230 0.536324 -0.900317 0.597031 -0.613705 0.689425 0.845532 0.941027 0.934951 -0.769985
EOD~DBC.11 -0.822062 1.000000 -0.574282 0.349666 0.200079 -0.600426 0.325952 0.228763 0.600429 -0.472528 ... 0.333239 -0.215966 0.773717 -0.238640 0.895370 -0.509694 -0.644009 -0.717165 -0.713552 0.870505
EOD~DIA.11 0.794479 -0.574282 1.000000 0.468724 0.563945 0.917934 0.519725 0.568437 -0.070294 0.897045 ... 0.412195 0.875397 -0.883305 0.898652 -0.346392 0.975872 0.988677 0.910599 0.917660 -0.657030
EOD~EEM.11 0.069905 0.349666 0.468724 1.000000 0.906563 0.383581 0.968244 0.871560 0.609441 0.553773 ... 0.872187 0.788827 -0.162012 0.743330 0.522937 0.488016 0.403966 0.239615 0.227668 0.224999
EOD~EFA.11 0.114368 0.200079 0.563945 0.906563 1.000000 0.412868 0.881808 0.961832 0.722791 0.701312 ... 0.975473 0.852729 -0.274274 0.759922 0.333035 0.596364 0.504831 0.291689 0.308123 0.135446
EOD~EMB.11 0.912854 -0.600426 0.917934 0.383581 0.412868 1.000000 0.449375 0.408301 -0.245520 0.785165 ... 0.257371 0.776831 -0.917891 0.815172 -0.346823 0.842485 0.939693 0.947970 0.968140 -0.655235
EOD~EPP.11 0.119582 0.325952 0.519725 0.968244 0.881808 0.449375 1.000000 0.859786 0.533985 0.559117 ... 0.831654 0.806764 -0.229427 0.790734 0.529161 0.540090 0.449015 0.305822 0.293515 0.170135
EOD~EWG.11 0.099310 0.228763 0.568437 0.871560 0.961832 0.408301 0.859786 1.000000 0.670526 0.664514 ... 0.943557 0.830771 -0.232118 0.742909 0.318852 0.593605 0.502004 0.297612 0.325287 0.151760
EOD~EWI.11 -0.482615 0.600429 -0.070294 0.609441 0.722791 -0.245520 0.533985 0.670526 1.000000 0.120915 ... 0.834734 0.285055 0.339941 0.172235 0.536058 0.011219 -0.128671 -0.333470 -0.332439 0.588681
EOD~EWJ.11 0.638039 -0.472528 0.897045 0.553773 0.701312 0.785165 0.559117 0.664514 0.120915 1.000000 ... 0.548281 0.933750 -0.799630 0.838301 -0.249518 0.898535 0.885168 0.713208 0.773878 -0.522373
EOD~EWQ.11 0.133298 0.180010 0.579963 0.830074 0.959346 0.431202 0.813360 0.964138 0.715726 0.656312 ... 0.957204 0.814066 -0.266488 0.718769 0.247110 0.597796 0.527563 0.334069 0.361204 0.126692
EOD~EWU.11 -0.517597 0.742558 -0.122190 0.731485 0.728727 -0.246228 0.672423 0.673132 0.896797 0.075002 ... 0.821343 0.299800 0.386653 0.232442 0.761248 -0.055186 -0.197699 -0.373542 -0.389307 0.690239
EOD~FXB.11 -0.881718 0.785573 -0.815277 -0.010376 -0.047338 -0.858135 -0.098489 -0.097932 0.579336 -0.653444 ... 0.134844 -0.535225 0.884302 -0.575261 0.587652 -0.760275 -0.846165 -0.880120 -0.914238 0.838066
EOD~FXC.11 -0.764207 0.959362 -0.539431 0.409094 0.227213 -0.564192 0.358177 0.235279 0.589891 -0.442876 ... 0.357525 -0.178929 0.757370 -0.200309 0.851797 -0.495695 -0.609985 -0.662610 -0.680516 0.858121
EOD~FXE.11 -0.751606 0.934234 -0.555548 0.300953 0.178322 -0.563021 0.258460 0.225434 0.596749 -0.503935 ... 0.339213 -0.241947 0.777214 -0.284865 0.737566 -0.529786 -0.605043 -0.642191 -0.645994 0.876336
EOD~FXI.11 0.297078 -0.145443 0.540373 0.714284 0.708423 0.410806 0.639610 0.598766 0.337544 0.727237 ... 0.606870 0.740698 -0.400239 0.678539 0.086844 0.568849 0.503931 0.335457 0.325568 -0.138245
EOD~FXY.11 -0.215348 0.587160 -0.254213 0.199016 0.014197 -0.050179 0.253768 0.122715 0.055390 -0.318839 ... 0.073925 -0.079584 0.333551 -0.097576 0.503652 -0.329023 -0.265113 -0.182349 -0.121594 0.515819
EOD~GDX.11 0.235482 0.260990 0.208053 0.412808 0.224695 0.397513 0.504812 0.300218 -0.088655 0.081988 ... 0.191500 0.301804 -0.123559 0.395343 0.384495 0.110501 0.178966 0.287926 0.301502 0.198218
EOD~GLD.11 0.082024 0.397393 0.056570 0.440848 0.253518 0.230380 0.490394 0.331595 0.088205 -0.052560 ... 0.277139 0.204842 0.095048 0.258024 0.409309 -0.047593 0.031461 0.148379 0.136570 0.411007
EOD~IEF.11 0.960502 -0.887364 0.616693 -0.161149 -0.115984 0.772624 -0.123896 -0.142085 -0.614841 0.462854 ... -0.250644 0.309003 -0.805882 0.370280 -0.728639 0.493679 0.688562 0.839497 0.822259 -0.767665
EOD~IYR.11 0.955606 -0.765263 0.856419 0.192315 0.242269 0.915085 0.257621 0.215767 -0.375769 0.704389 ... 0.084490 0.628472 -0.911824 0.716636 -0.502344 0.778801 0.881168 0.949289 0.909633 -0.750171
EOD~JNK.11 0.445421 -0.022564 0.778771 0.820003 0.808507 0.732292 0.866058 0.809576 0.292240 0.766475 ... 0.703327 0.907781 -0.570855 0.920337 0.232295 0.799863 0.729748 0.596253 0.625980 -0.180673
EOD~LQD.11 0.971305 -0.694091 0.866714 0.266759 0.286353 0.964510 0.323333 0.282643 -0.352670 0.710935 ... 0.133318 0.672902 -0.896247 0.742058 -0.450059 0.774881 0.896336 0.957108 0.952372 -0.693004
EOD~SLV.11 -0.354167 0.724506 -0.256180 0.371996 0.187288 -0.141498 0.416171 0.279668 0.235227 -0.240594 ... 0.240493 0.014046 0.403814 0.005046 0.694764 -0.285405 -0.307128 -0.277922 -0.240663 0.644029
EOD~SPY.11 0.842149 -0.641248 0.991233 0.415816 0.517151 0.935603 0.467515 0.502180 -0.123878 0.895104 ... 0.358526 0.848230 -0.927519 0.877744 -0.392401 0.963235 0.991927 0.927931 0.932395 -0.704327
EOD~TIP.11 0.859502 -0.457060 0.753160 0.390606 0.312750 0.916770 0.470817 0.297403 -0.310658 0.603516 ... 0.167172 0.652469 -0.789489 0.734690 -0.167651 0.662339 0.773236 0.855883 0.845312 -0.504442
EOD~TLT.11 0.936742 -0.849735 0.571482 -0.136222 -0.123104 0.731014 -0.105875 -0.155546 -0.601886 0.415551 ... -0.252825 0.277254 -0.752814 0.365316 -0.671258 0.450654 0.638300 0.805797 0.756771 -0.729281
EOD~USO.11 -0.867886 0.986548 -0.664995 0.266093 0.123862 -0.682872 0.227297 0.129020 0.595750 -0.549693 ... 0.270583 -0.315690 0.824281 -0.340152 0.865064 -0.598733 -0.723483 -0.789460 -0.793365 0.881490
EOD~UUP.11 0.675114 -0.898963 0.518790 -0.313187 -0.187701 0.489625 -0.273305 -0.226168 -0.569004 0.479315 ... -0.345297 0.212514 -0.720817 0.252331 -0.710915 0.514331 0.560829 0.577542 0.575982 -0.860378
EOD~VGK.11 -0.037230 0.333239 0.412195 0.872187 0.975473 0.257371 0.831654 0.943557 0.834734 0.548281 ... 1.000000 0.726352 -0.101834 0.634106 0.400804 0.446705 0.351284 0.146801 0.152965 0.296861
EOD~VPL.11 0.536324 -0.215966 0.875397 0.788827 0.852729 0.776831 0.806764 0.830771 0.285055 0.933750 ... 0.726352 1.000000 -0.675383 0.930537 0.013313 0.873740 0.841634 0.672990 0.709966 -0.314635
EOD~VXX.11 -0.900317 0.773717 -0.883305 -0.162012 -0.274274 -0.917891 -0.229427 -0.232118 0.339941 -0.799630 ... -0.101834 -0.675383 1.000000 -0.706347 0.498880 -0.842539 -0.914515 -0.899269 -0.930061 0.808017
EOD~XLB.11 0.597031 -0.238640 0.898652 0.743330 0.759922 0.815172 0.790734 0.742909 0.172235 0.838301 ... 0.634106 0.930537 -0.706347 1.000000 0.051290 0.893472 0.852025 0.745800 0.733184 -0.364143
EOD~XLE.11 -0.613705 0.895370 -0.346392 0.522937 0.333035 -0.346823 0.529161 0.318852 0.536058 -0.249518 ... 0.400804 0.013313 0.498880 0.051290 1.000000 -0.274287 -0.426726 -0.510594 -0.519703 0.697664
EOD~XLF.11 0.689425 -0.509694 0.975872 0.488016 0.596364 0.842485 0.540090 0.593605 0.011219 0.898535 ... 0.446705 0.873740 -0.842539 0.893472 -0.274287 1.000000 0.951554 0.827935 0.840891 -0.633096
EOD~XLK.11 0.845532 -0.644009 0.988677 0.403966 0.504831 0.939693 0.449015 0.502004 -0.128671 0.885168 ... 0.351284 0.841634 -0.914515 0.852025 -0.426726 0.951554 1.000000 0.931938 0.944955 -0.705373
EOD~XLU.11 0.941027 -0.717165 0.910599 0.239615 0.291689 0.947970 0.305822 0.297612 -0.333470 0.713208 ... 0.146801 0.672990 -0.899269 0.745800 -0.510594 0.827935 0.931938 1.000000 0.960408 -0.725899
EOD~CSJ.11 0.934951 -0.713552 0.917660 0.227668 0.308123 0.968140 0.293515 0.325287 -0.332439 0.773878 ... 0.152965 0.709966 -0.930061 0.733184 -0.519703 0.840891 0.944955 0.960408 1.000000 -0.728065
EOD~FXF.11 -0.769985 0.870505 -0.657030 0.224999 0.135446 -0.655235 0.170135 0.151760 0.588681 -0.522373 ... 0.296861 -0.314635 0.808017 -0.364143 0.697664 -0.633096 -0.705373 -0.725899 -0.728065 1.000000

39 rows × 39 columns

b.

We now copy our correlation matrix to a txt file so as to read it later as we did in the tutorial:

In [6]:
corr_matrix_initial.to_csv('output.txt', index = True, header = True, sep = ' ')

In order to construct the edgelist file as in the tutorial we edit the correlation matrix so as to have all possible edge combinations with their correlation as follows:

In [7]:
# reshape the corr_matrix so as to have all possible edge combinations with their correlations
# include only values with absolute value of correlation greater or equal than 0.01
# convert corr_matrix to columns
corr_matrix = corr_matrix[abs(corr_matrix) >= 0.01].stack().reset_index(name = 'correlations')

# remove diagonal line of matrix
corr_matrix = corr_matrix[corr_matrix['level_0'] != corr_matrix['level_1']]

# concatenate the two columns after sorting
corr_matrix['columns'] = corr_matrix.apply(lambda lamda: '-'.join(sorted([lamda['level_0'], lamda['level_1']])), axis = 1)

# drop duplicates and concatenated columns
corr_matrix = corr_matrix.drop_duplicates(['columns']) 
corr_matrix.drop(['columns'], inplace = True, axis = 1)

# final edgelist file
edgelist = corr_matrix
edgelist
Out[7]:
level_0 level_1 correlations
1 EOD~BND.11 EOD~DBC.11 -0.822062
2 EOD~BND.11 EOD~DIA.11 0.794479
3 EOD~BND.11 EOD~EEM.11 0.069905
4 EOD~BND.11 EOD~EFA.11 0.114368
5 EOD~BND.11 EOD~EMB.11 0.912854
... ... ... ...
1396 EOD~XLK.11 EOD~CSJ.11 0.944955
1397 EOD~XLK.11 EOD~FXF.11 -0.705373
1435 EOD~XLU.11 EOD~CSJ.11 0.960408
1436 EOD~XLU.11 EOD~FXF.11 -0.725899
1475 EOD~CSJ.11 EOD~FXF.11 -0.728065

738 rows × 3 columns

We now copy our correlation matrix to a txt file so as to read it later as we did in the tutorial:

In [8]:
edgelist.to_csv('edjelist.txt', index = False, header = False, sep = ' ')

We are now ready to create the graph.

In [9]:
# Create an empty graph structure with no nodes and no edges.
G = nx.Graph() # we use Graph from graphviz
# Read an un-directed graph from a list of edges
G = nx.read_edgelist("edjelist.txt", nodetype = str, data = [('weight', float)])

We now plot the Graph as indicated below and we experiment with several layouts:

In [21]:
plt.subplots(figsize = (20,20))
plt.title("Graph 1 - Circular Layout", fontsize = 20)
nx.draw(G,pos = nx.circular_layout(G), with_labels = True, node_color = 'red', edge_color = 'blue', font_size = 12)
In [20]:
plt.subplots(figsize = (20,20))
plt.title("Graph 1 - Random Layout", fontsize = 20)
nx.draw(G,pos = nx.random_layout(G), with_labels = True, node_color = 'red', edge_color = 'blue', font_size = 12)
In [22]:
plt.subplots(figsize = (20,20))
plt.title("Graph 1 - Spring Layout", fontsize = 20)
nx.draw(G,pos = nx.spring_layout(G), with_labels = True, node_color = 'red', edge_color = 'blue', font_size = 12)
In [23]:
plt.subplots(figsize = (20,20))
plt.title("Graph 1 - Spectral Layout", fontsize = 20)
nx.draw(G,pos = nx.spectral_layout(G), with_labels = True, node_color = 'red', edge_color = 'blue', font_size = 12)

We experimented with the different graph layouts and we chose the circular one (similar to shell). We chose the circular layout as in that layout all of the nodes are connected with each other. We know that all nodes have to be connected with each other, as all of them have a significant correlation with the rest.

With the selected layout, colours and size, we managed to visualise our network better and make it more beautiful.

We also tried random, spring and spectral layout. We rejected even the best of them, which was random, since nodes were positioned randomly and were not clearly visualized.

c.

We are now ready to enhance our plot. We first create a second graph as follows:

In [16]:
# Create an empty graph structure with no nodes and no edges.
G2 = nx.Graph() # we use Graph from graphviz
# Read an un-directed graph from a list of edges
G2 = nx.read_edgelist("edjelist.txt", nodetype = str, data = [('weight', float)])

We now store our weights in order to use them for the thickness of the edges and the size of our nodes.

In [13]:
weights = abs(corr_matrix_initial).sum(axis = 0).to_list() # absolute value of sum of correlations in a node
weights = list(np.asarray(weights) * 20) # we multiple by 20 in order to be more visible
# we use absolute value because if nodes have very high positive and very high negative correlations
# we risk being shown with zero correlations
weights2 = list(nx.get_edge_attributes(G2, 'weight').values())
weights2 = list(np.asarray((weights2)))

We now plot the Graph as indicated below:

In [14]:
plt.subplots(figsize = (20,20))
plt.title("Graph 2", fontsize = 20)
nx.draw(G2, pos = nx.circular_layout(G2), node_size = weights, width = weights2, with_labels = True, node_color = 'red', edge_color = 'blue', font_size = 12)

To sum up, we visualised the different weights through edge thickness and node size.

At the following commands we make use of the graphviz external drawing package for our second graph and call one of its drawing functions.

In [15]:
graphviz_example = Graph(name = 'Graphviz Example', format = 'svg', strict = True)

node_list = list(nx.nodes(G2))

# we restrict only to 9 nodes since the graph 
# would then be too large for effective visualisation
for j in range(0,9):
    for i in range(0, 9):
        if node_list[j] != node_list[i]:
            graphviz_example.edge(node_list[j], node_list[i])

graphviz_example
Out[15]:
Graphviz Example EOD~BND.11 EOD~BND.11 EOD~DBC.11 EOD~DBC.11 EOD~BND.11--EOD~DBC.11 EOD~DIA.11 EOD~DIA.11 EOD~BND.11--EOD~DIA.11 EOD~EEM.11 EOD~EEM.11 EOD~BND.11--EOD~EEM.11 EOD~EFA.11 EOD~EFA.11 EOD~BND.11--EOD~EFA.11 EOD~EMB.11 EOD~EMB.11 EOD~BND.11--EOD~EMB.11 EOD~EPP.11 EOD~EPP.11 EOD~BND.11--EOD~EPP.11 EOD~EWG.11 EOD~EWG.11 EOD~BND.11--EOD~EWG.11 EOD~EWI.11 EOD~EWI.11 EOD~BND.11--EOD~EWI.11 EOD~DBC.11--EOD~DIA.11 EOD~DBC.11--EOD~EEM.11 EOD~DBC.11--EOD~EFA.11 EOD~DBC.11--EOD~EMB.11 EOD~DBC.11--EOD~EPP.11 EOD~DBC.11--EOD~EWG.11 EOD~DBC.11--EOD~EWI.11 EOD~DIA.11--EOD~EEM.11 EOD~DIA.11--EOD~EFA.11 EOD~DIA.11--EOD~EMB.11 EOD~DIA.11--EOD~EPP.11 EOD~DIA.11--EOD~EWG.11 EOD~DIA.11--EOD~EWI.11 EOD~EEM.11--EOD~EFA.11 EOD~EEM.11--EOD~EMB.11 EOD~EEM.11--EOD~EPP.11 EOD~EEM.11--EOD~EWG.11 EOD~EEM.11--EOD~EWI.11 EOD~EFA.11--EOD~EMB.11 EOD~EFA.11--EOD~EPP.11 EOD~EFA.11--EOD~EWG.11 EOD~EFA.11--EOD~EWI.11 EOD~EMB.11--EOD~EPP.11 EOD~EMB.11--EOD~EWG.11 EOD~EMB.11--EOD~EWI.11 EOD~EPP.11--EOD~EWG.11 EOD~EPP.11--EOD~EWI.11 EOD~EWG.11--EOD~EWI.11